GitHub Repo: https://github.com/kennethkn/congresstweets-analysis
This project aims to foster a boarder understanding of the bipartisan U.S. politics by analyzing the tweets of U.S. Congress members, including Democratic and Republican senators and representatives. The importance of it lies in the potential to reveal patterns and trends in the political discourse of recent years. Understanding these patterns can provide insights into the priorities and strategies of the two parties. Additionally, analyzing the sentiment of tweets can reveal their stance on current issues.
The data used in this analysis is a collection of tweets from U.S.
Congress members, available on GitHub at https://github.com/alexlitel/congresstweets. Given the
enormous size of the dataset (~4M entries), I have chosen a database
approach to store and query the data. The database is hosted locally on
my computer via PostgresSQL, but you can reproduce the database by
executing Python scripts in the scripts folder, which holds
scripts for database construction as well as text mining. For more
information on how to use them, please refer to README.
This R Markdown document will query the database to perform the analysis
and generate the plots.
First, we need to load the necessary libraries and connect to the database.
library(DBI)
library(RPostgres)
library(ggplot2)
library(tidytext)
library(ggpattern)
readRenviron(".env")
DATABASE_URL <- Sys.getenv("DATABASE_URL")
url_parts <- unlist(strsplit(DATABASE_URL, "://|@|:|/"))
driver <- url_parts[1]
user <- url_parts[2]
host <- url_parts[3]
port <- url_parts[4]
dbname <- url_parts[5]
con <- dbConnect(RPostgres::Postgres(), dbname = dbname, host = host, port = port, user = user)
D_HEX <- "#00AEF3"
R_HEX <- "#E81B23"
NA_HEX <- "grey"
HOUSE_HEX <- "#F8766D"
SENATE_HEX <- "#00BFC4"
Let’s start by exploring the structure of the database.
# List tables
dbListTables(con)
## [1] "members" "accounts" "tweets" "alembic_version"
# Row count and column names for "tweets" table
dbGetQuery(con, "SELECT COUNT(*) FROM tweets")
## count
## 1 4310843
dbGetQuery(con, "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'tweets'")
## column_name data_type
## 1 sentiment_score double precision
## 2 time timestamp without time zone
## 3 link character varying
## 4 text character varying
## 5 source character varying
## 6 hashtags ARRAY
## 7 mentions ARRAY
## 8 retweeting character varying
## 9 cleaned_text character varying
## 10 tokens ARRAY
## 11 id character varying
## 12 quoting character varying
## 13 account_id character varying
# Row count and column names for "members" table
dbGetQuery(con, "SELECT COUNT(*) FROM members")
## count
## 1 730
dbGetQuery(con, "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'members'")
## column_name data_type
## 1 id integer
## 2 name character varying
## 3 chamber character varying
## 4 party character varying
# Row count and column names for "accounts" table
dbGetQuery(con, "SELECT COUNT(*) FROM accounts")
## count
## 1 1487
dbGetQuery(con, "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'accounts'")
## column_name data_type
## 1 member_id integer
## 2 id character varying
## 3 handle character varying
## 4 account_type character varying
## 5 prev_handles ARRAY
Now, let’s take a look at the first few rows of each table.
dbGetQuery(con, "SELECT * FROM tweets LIMIT 6")
## id account_id time
## 1 1004924628584652801 41363507 2018-06-07 23:14:36
## 2 1005541414275506176 47747074 2018-06-09 16:05:29
## 3 1006194503823384576 248699486 2018-06-11 11:20:38
## 4 1006601144259317761 381152398 2018-06-12 14:16:29
## 5 1006541101862449152 248699486 2018-06-12 10:17:53
## 6 1007004577948766208 18137749 2018-06-13 16:59:35
## link
## 1 https://www.twitter.com/DeanHeller/statuses/1004924628584652801
## 2 https://www.twitter.com/brianschatz/statuses/1005541414275506176
## 3 https://www.twitter.com/RepJeffDenham/statuses/1006194503823384576
## 4 https://www.twitter.com/RepTerriSewell/statuses/1006601144259317761
## 5 https://www.twitter.com/RepJeffDenham/statuses/1006541101862449152
## 6 https://www.twitter.com/ChrisVanHollen/statuses/1007004577948766208
## text
## 1 #vegasborn
## 2 ! http://pbs.twimg.com/media/DfRmllaVMAAuKPZ.jpg
## 3 http://pbs.twimg.com/media/Dfa4jtkWsAAvTmt.png
## 4 http://pbs.twimg.com/media/DfgqYm_WkAUBzip.jpg
## 5 http://pbs.twimg.com/media/DffztXzW0AIZ5em.png
## 6 https://www.pscp.tv/w/bfF9NzF4a2pEWURHVmFlRXp8MWxQS3FrcmFNcEFKYpSRL-75_Xnp5Vp6sLudNJ1rwlAakc3XZdH1N_y9v_Es
## source hashtags mentions retweeting sentiment_score
## 1 Twitter for iPhone {#vegasborn} {} <NA> 0
## 2 Twitter for iPhone {} {} <NA> 0
## 3 TweetDeck {} {} <NA> 0
## 4 Twitter Web Client {} {} <NA> 0
## 5 TweetDeck {} {} <NA> 0
## 6 Periscope {} {} <NA> 0
## cleaned_text tokens quoting
## 1 #vegasborn {vegasborn} <NA>
## 2 ! {} <NA>
## 3 {} <NA>
## 4 {} <NA>
## 5 {} <NA>
## 6 {} <NA>
dbGetQuery(con, "SELECT * FROM members LIMIT 6")
## id name chamber party
## 1 188 Don Young house R
## 2 189 Bradley Byrne house R
## 3 190 Felix Moore house R
## 4 191 Gary Palmer house R
## 5 192 Jerry Carl house R
## 6 193 Martha Roby house R
dbGetQuery(con, "SELECT * FROM accounts LIMIT 6")
## id handle account_type prev_handles member_id
## 1 37007274 repdonyoung office <NA> 188
## 2 2559398984 DonYoungAK campaign <NA> 188
## 3 42481696 BradleyByrne campaign <NA> 189
## 4 2253968388 RepByrne office <NA> 189
## 5 74054398 BarryMoore campaign <NA> 190
## 6 1339006078688825344 RepBarryMoore office <NA> 190
We can join the tables to get more information about the authors of the tweets.
dbGetQuery(con, "SELECT * FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id LIMIT 6")
## id account_id time
## 1 1007004577948766208 18137749 2018-06-13 16:59:35
## 2 1007335177838845953 19407835 2018-06-14 14:53:16
## 3 1008024813384949765 58579942 2018-06-16 12:33:38
## 4 1009479861910605824 234014087 2018-06-20 12:55:28
## 5 1009508199190257664 234014087 2018-06-20 14:48:05
## 6 1009873147817013253 235271965 2018-06-21 14:58:15
## link
## 1 https://www.twitter.com/ChrisVanHollen/statuses/1007004577948766208
## 2 https://www.twitter.com/RepMikeCoffman/statuses/1007335177838845953
## 3 https://www.twitter.com/mattgaetz/statuses/1008024813384949765
## 4 https://www.twitter.com/RepWilson/statuses/1009479861910605824
## 5 https://www.twitter.com/RepWilson/statuses/1009508199190257664
## 6 https://www.twitter.com/RepStevePearce/statuses/1009873147817013253
## text
## 1 https://www.pscp.tv/w/bfF9NzF4a2pEWURHVmFlRXp8MWxQS3FrcmFNcEFKYpSRL-75_Xnp5Vp6sLudNJ1rwlAakc3XZdH1N_y9v_Es
## 2 http://pbs.twimg.com/media/DfrFwvqXcAYzxj8.jpg
## 3 @WKRG MC
## 4 http://pbs.twimg.com/media/DgFP93jWsAEMYBJ.jpg
## 5 http://pbs.twimg.com/media/DgFP93jWsAEMYBJ.jpg
## 6 http://pbs.twimg.com/media/DgPKRILWAAE8JlG.jpg
## source hashtags mentions retweeting sentiment_score cleaned_text
## 1 Periscope {} {} <NA> 0
## 2 Twitter Web Client {} {} <NA> 0
## 3 Twitter for iPhone {} {WKRG} <NA> 0 @WKRG MC
## 4 Media Studio {} {} <NA> 0
## 5 Media Studio {} {} <NA> 0
## 6 Twitter Web Client {} {} <NA> 0
## tokens quoting id..14 handle account_type prev_handles
## 1 {} <NA> 18137749 ChrisVanHollen office <NA>
## 2 {} <NA> 19407835 RepMikeCoffman office <NA>
## 3 {wkrg,mc} <NA> 58579942 mattgaetz campaign <NA>
## 4 {} <NA> 234014087 RepWilson office <NA>
## 5 {} <NA> 234014087 RepWilson office <NA>
## 6 {} <NA> 235271965 RepStevePearce office <NA>
## member_id id..19 name chamber party
## 1 847 847 Chris Van Hollen senate D
## 2 287 287 Mike Coffman house R
## 3 322 322 Matt Gaetz house R
## 4 312 312 Frederica Wilson house D
## 5 312 312 Frederica Wilson house D
## 6 552 552 Steve Pearce house R
What is the time range of the tweets?
dbGetQuery(con, "SELECT MIN(time), MAX(time) FROM tweets")
## min max
## 1 2017-06-21 00:02:47 2023-07-11 17:00:02
We have tweets from June 2017 to July 2023. Since the data for 2017 and 2023 are incomplete, we will focus on the period from 2018 to 2022 for any time series analysis, which includes the 2nd session of the 115th Congress to the 1st session of the 117th Congress.
For reference: 115th Congress (2017-2019): Republican majority in both chambers 116th Congress (2019-2021): Democratic majority in the House, Republican majority in the Senate 117th Congress (2021-2023): Democratic majority in both chambers
Let’s also find out the possible values for categorical columns.
dbGetQuery(con, "SELECT DISTINCT chamber FROM members")
## chamber
## 1 senate
## 2 house
dbGetQuery(con, "SELECT DISTINCT party FROM members")
## party
## 1 <NA>
## 2 R
## 3 D
dbGetQuery(con, "SELECT DISTINCT account_type FROM accounts")
## account_type
## 1 office
## 2 campaign
How many tweets are we dealing with?
dbGetQuery(con, "SELECT COUNT(*) FROM tweets")
## count
## 1 4310843
There are a total of 4,310,843 tweets posted by U.S. Congress members from June 2017 to July 2023.
Table of Contents:
Let’s start by counting the number of tweets by party affiliation of the tweet author.
tweet_count_by_party <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id GROUP BY party")
tweet_count_by_party$percentage <- tweet_count_by_party$count / sum(tweet_count_by_party$count) * 100
tweet_count_by_party
## party count percentage
## 1 D 2504716 58.1026959
## 2 R 1763583 40.9103973
## 3 <NA> 42544 0.9869067
pie(tweet_count_by_party$count, labels = paste0(c("Democratic Party", "Republican Party", "Independent"), " (", round(tweet_count_by_party$percentage, 2), "%)"), main = "Tweets of U.S. Congress Members, by Party of Tweet Author, 2017-2023", col = c(D_HEX, R_HEX, NA_HEX))
Democratic Congress members have tweeted the most from 2017 to 2023, accounting for nearly 60% of all tweets, followed by Republican members at around 40%. Independent members have tweeted the least at less than 1%.
Now let’s see how the number of tweets has evolved over time.
tweet_count_by_party_2018 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY party")
tweet_count_by_party_2019 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY party")
tweet_count_by_party_2020 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY party")
tweet_count_by_party_2021 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY party")
tweet_count_by_party_2022 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY party")
tweet_count_by_party_2018$year <- 2018
tweet_count_by_party_2019$year <- 2019
tweet_count_by_party_2020$year <- 2020
tweet_count_by_party_2021$year <- 2021
tweet_count_by_party_2022$year <- 2022
tweet_count_by_party_ts <- rbind(tweet_count_by_party_2018, tweet_count_by_party_2019, tweet_count_by_party_2020, tweet_count_by_party_2021, tweet_count_by_party_2022)
tweet_count_ts <- aggregate(count ~ year, tweet_count_by_party_ts, sum)
tweet_count_ts
## year count
## 1 2018 638951
## 2 2019 721569
## 3 2020 882284
## 4 2021 738086
## 5 2022 721410
tweet_count_by_party_ts
## party count year
## 1 D 367343 2018
## 2 R 263391 2018
## 3 <NA> 8217 2018
## 4 D 454744 2019
## 5 R 256210 2019
## 6 <NA> 10615 2019
## 7 D 538642 2020
## 8 R 332261 2020
## 9 <NA> 11381 2020
## 10 D 412416 2021
## 11 R 321545 2021
## 12 <NA> 4125 2021
## 13 D 395843 2022
## 14 R 322335 2022
## 15 <NA> 3232 2022
ggplot(tweet_count_ts, aes(x = year, y = count)) +
geom_point() +
geom_line() +
scale_y_continuous(labels = scales::comma) +
labs(title = "Tweets of U.S. Congress Members, by Year, 2018-2022", x = "Year", y = "Number of Tweets") +
theme_minimal()
ggplot(tweet_count_by_party_ts, aes(x = year, y = count, fill = party)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c(D_HEX, R_HEX, NA_HEX), labels = c("Democratic", "Republican", "Independent")) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Tweets of U.S. Congress Members, by Party of Tweet Author and Year, 2018-2022", x = "Year", y = "Number of Tweets", fill = "Party") +
theme_minimal()
The line plot shows the number of tweets peaked in 2020, aka election year, following progressive growth from 2018. Notably, it did not fall back to 2018 levels after the election, suggesting that the members have continued to use X as a platform to communicate with the public after enthusiasm from the election.
The bar plot shows that the Democratic members have tweeted more than the Republican members in each year. Interestingly, the number of tweets from Republican members sustained at a significantly higher level than in 2018 following an abrupt increase in 2020, while the same cannot be said for the Democratic members, suggesting that the Republican members are the main drivers of the overall increase in the number of tweets after the election.
Next, we do the same, but by chamber of the tweet author.
tweet_count_by_chamber <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id GROUP BY chamber")
tweet_count_by_chamber$per_seat_rate <- tweet_count_by_chamber$count / ifelse(tweet_count_by_chamber$chamber == "house", 435, 100)
tweet_count_by_chamber$percentage <- tweet_count_by_chamber$per_seat_rate / sum(tweet_count_by_chamber$per_seat_rate) * 100
tweet_count_by_chamber
## chamber count per_seat_rate percentage
## 1 house 3258025 7489.713 41.56819
## 2 senate 1052818 10528.180 58.43181
pie(tweet_count_by_chamber$per_seat_rate, labels = paste0(c("House", "Senate"), " (", round(tweet_count_by_chamber$percentage, 2), "%)"), main = "Tweets of U.S. Congress Members, by Chamber of Tweet Author, 2017-2023", col = c(HOUSE_HEX, SENATE_HEX))
title(sub = "Note: Per-seat figures are used to account for the difference in the number of seats between the House and the Senate")
Members of the Senate have tweeted more than members of the House on a per-seat basis by around 15%, indicating that Senate members are on average more active on Twitter than House members.
tweet_count_by_chamber_2018 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY chamber")
tweet_count_by_chamber_2019 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY chamber")
tweet_count_by_chamber_2020 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY chamber")
tweet_count_by_chamber_2021 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY chamber")
tweet_count_by_chamber_2022 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY chamber")
tweet_count_by_chamber_2018$year <- 2018
tweet_count_by_chamber_2019$year <- 2019
tweet_count_by_chamber_2020$year <- 2020
tweet_count_by_chamber_2021$year <- 2021
tweet_count_by_chamber_2022$year <- 2022
tweet_count_by_chamber_ts <- rbind(tweet_count_by_chamber_2018, tweet_count_by_chamber_2019, tweet_count_by_chamber_2020, tweet_count_by_chamber_2021, tweet_count_by_chamber_2022)
tweet_count_by_chamber_ts$per_seat_rate <- tweet_count_by_chamber_ts$count / ifelse(tweet_count_by_chamber_ts$chamber == "house", 435, 100)
tweet_count_by_chamber_ts
## chamber count year per_seat_rate
## 1 house 453734 2018 1043.067
## 2 senate 185217 2018 1852.170
## 3 house 546119 2019 1255.446
## 4 senate 175450 2019 1754.500
## 5 house 664733 2020 1528.122
## 6 senate 217551 2020 2175.510
## 7 house 584241 2021 1343.083
## 8 senate 153845 2021 1538.450
## 9 house 559701 2022 1286.669
## 10 senate 161709 2022 1617.090
ggplot(tweet_count_by_chamber_ts, aes(x = year, y = per_seat_rate, fill = chamber)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c(HOUSE_HEX, SENATE_HEX), labels = c("House", "Senate")) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Tweets of U.S. Congress Members, by Chamber of Tweet Author and Year, 2018-2022", x = "Year", y = "Number of Tweets per Seat", fill = "Chamber", subtitle = "Note: Per-seat figures are used to account for the difference in the number of seats between the House and the Senate") +
theme_minimal()
Members of the Senate has tweeted more than members of the House on a per-seat basis in each year, which is consistent with the previous observation. The number of tweets from both chambers peaked in 2020, but the increase was more pronounced in the Senate, suggesting that the Senate members showed more enthusiasm in using X as a platform to communicate with the public during the election.
Moving on, let’s see some names. Who are the top tweeters in each year?
tweet_count_max_2018 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2019 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2020 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2021 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2022 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2018$year <- 2018
tweet_count_max_2019$year <- 2019
tweet_count_max_2020$year <- 2020
tweet_count_max_2021$year <- 2021
tweet_count_max_2022$year <- 2022
tweet_count_max_ts <- rbind(tweet_count_max_2018, tweet_count_max_2019, tweet_count_max_2020, tweet_count_max_2021, tweet_count_max_2022)
tweet_count_max_ts
## name chamber party account_type count year
## 1 Billy Long house R campaign 19672 2018
## 2 Lacy Clay house D campaign 8779 2018
## 3 Don Beyer house D office 6004 2018
## 4 Pramila Jayapal house D office 5087 2018
## 5 Ted Lieu house D campaign 4217 2018
## 6 Ro Khanna house D campaign 4187 2018
## 7 John Cornyn senate R office 4035 2018
## 8 Orrin Hatch senate R office 3954 2018
## 9 Marsha Blackburn senate R campaign 3862 2018
## 10 Ileana Ros-Lehtinen house R office 3730 2018
## 11 Billy Long house R campaign 13196 2019
## 12 Chip Roy house R campaign 10978 2019
## 13 Lacy Clay house D campaign 6005 2019
## 14 Don Beyer house D office 5922 2019
## 15 Joaquin Castro house D campaign 5828 2019
## 16 Andy Biggs house R office 5268 2019
## 17 Elizabeth Warren senate D campaign 4830 2019
## 18 Pramila Jayapal house D office 4521 2019
## 19 Bernie Sanders senate <NA> campaign 4498 2019
## 20 John Cornyn senate R office 4465 2019
## 21 Chip Roy house R campaign 14191 2020
## 22 Billy Long house R campaign 14173 2020
## 23 John Cornyn senate R office 8685 2020
## 24 Andy Biggs house R office 7290 2020
## 25 Dwight Evans house D office 6716 2020
## 26 Don Beyer house D office 6154 2020
## 27 Ted Cruz senate R campaign 4859 2020
## 28 Tim Burchett house R campaign 4462 2020
## 29 Pramila Jayapal house D office 4446 2020
## 30 Ruben Gallego house D campaign 4259 2020
## 31 Don Beyer house D office 5878 2021
## 32 John Cornyn senate R office 5762 2021
## 33 Claudia Tenney house R campaign 5451 2021
## 34 Dwight Evans house D office 5290 2021
## 35 Pramila Jayapal house D office 4930 2021
## 36 Billy Long house R campaign 4819 2021
## 37 Ted Cruz senate R campaign 4338 2021
## 38 Thomas Massie house R office 3907 2021
## 39 Sean Casten house D office 3895 2021
## 40 Don Bacon house R office 3792 2021
## 41 Billy Long house R campaign 10170 2022
## 42 Chip Roy house R campaign 7354 2022
## 43 Bill Pascrell house D office 5972 2022
## 44 Ted Cruz senate R campaign 5466 2022
## 45 John Cornyn senate R office 5367 2022
## 46 Eric Swalwell house D campaign 5093 2022
## 47 Don Bacon house R campaign 4751 2022
## 48 Don Beyer house D office 4535 2022
## 49 Dwight Evans house D office 4499 2022
## 50 Adam Kinzinger house R campaign 3915 2022
ggplot(tweet_count_max_ts, aes(x = count, y = reorder_within(name, count, year))) +
geom_col(aes(fill = party)) +
geom_col_pattern(aes(pattern = chamber), pattern_key_scale_factor = 0.5, fill = NA, color = "white", pattern_color = "white") +
facet_wrap(~year, scales = "free_y") +
labs(title = "Top 10 Congress Members with the Most Tweets, by Year", x = "Number of Tweets", y = "Congress Member", fill = "Party", pattern = "Chamber") +
scale_fill_manual(values = c(D_HEX, R_HEX, NA_HEX), labels = c("Democratic", "Republican", "Independent")) +
scale_pattern_manual(values = c("stripe", "crosshatch"), labels = c("House", "Senate")) +
scale_x_continuous(labels = scales::comma) +
scale_y_reordered(labels = function(x) gsub("___\\d{4}", "", x)) +
theme_minimal()
The top tweeters are evenly distributed between the two parties, with representation from both chambers that are consistent with the ratio of seats. We can see that the top tweeters are largely the same in each year, suggesting that some Congress members are consistently using X as a major platform to communicate with the public or deliver campaign messages, such as Rep. Billy Long (R), Rep. Don Beyer (D), Sen. John Cornyn (R). Notably, Texas senator Ted Cruz (R) appeared in the list in 2020 and has stayed in the top 10 since.
Let’s also analyze with respect to chamber.
hashtags_top_30_h <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'house') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 30")
hashtags_top_30_s <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'senate') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 30")
hashtags_top_30_h$chamber <- "House of Representatives"
hashtags_top_30_h$rank <- rank(-hashtags_top_30_h$count)
hashtags_top_30_s$chamber <- "Senate"
hashtags_top_30_s$rank <- rank(-hashtags_top_30_s$count)
hashtags_top_30_by_chamber <- rbind(hashtags_top_30_h, hashtags_top_30_s)
hashtags_top_30_by_chamber
## hashtag count chamber rank
## 1 #COVID19 47768 House of Representatives 1
## 2 #ForThePeople 17168 House of Representatives 2
## 3 #SOTU 12363 House of Representatives 3
## 4 #BuildBackBetter 11673 House of Representatives 4
## 5 #coronavirus 10296 House of Representatives 5
## 6 #AmericanRescuePlan 9763 House of Representatives 6
## 7 #GOPTaxScam 8490 House of Representatives 7
## 8 #EndGunViolence 6777 House of Representatives 8
## 9 #ProtectOurCare 6728 House of Representatives 9
## 10 #NY21 6543 House of Representatives 10
## 11 #2020Census 6197 House of Representatives 11
## 12 #DACA 6026 House of Representatives 12
## 13 #ChildTaxCredit 5884 House of Representatives 13
## 14 #PuertoRico 5871 House of Representatives 14
## 15 #AZ01 5766 House of Representatives 15
## 16 #BlackHistoryMonth 5520 House of Representatives 16
## 17 #TrumpShutdown 5493 House of Representatives 17
## 18 #SCOTUS 5371 House of Representatives 18
## 19 #TaxReform 5211 House of Representatives 19
## 20 #BREAKING 4703 House of Representatives 20
## 21 #GetCovered 4667 House of Representatives 21
## 22 #USMCA 4637 House of Representatives 22
## 23 #utpol 4539 House of Representatives 23
## 24 #BidenBorderCrisis 4464 House of Representatives 24
## 25 #HR1 4397 House of Representatives 25
## 26 #HeroesAct 4258 House of Representatives 26
## 27 #WomensHistoryMonth 4243 House of Representatives 27
## 28 #MAGA 4238 House of Representatives 28
## 29 #NetNeutrality 4225 House of Representatives 29
## 30 #ICYMI 4198 House of Representatives 30
## 31 #COVID19 13323 Senate 1
## 32 #gapol 4360 Senate 2
## 33 #SCOTUS 3059 Senate 3
## 34 #coronavirus 3039 Senate 4
## 35 #Ohio 2829 Senate 5
## 36 #NetNeutrality 2712 Senate 6
## 37 #mtpol 2665 Senate 7
## 38 #SOTU 2634 Senate 8
## 39 #TaxReform 2567 Senate 9
## 40 #Kentucky 2538 Senate 10
## 41 #China 2235 Senate 11
## 42 #utpol 2098 Senate 12
## 43 #Venezuela 2037 Senate 13
## 44 #gasen 2013 Senate 14
## 45 #WV 1783 Senate 15
## 46 #Florida 1755 Senate 16
## 47 #DACA 1664 Senate 17
## 48 #mtsen 1658 Senate 18
## 49 #GAsen 1594 Senate 19
## 50 #BuildBackBetter 1504 Senate 20
## 51 #taxreform 1477 Senate 21
## 52 #GOPTaxScam 1443 Senate 22
## 53 #BidenBorderCrisis 1435 Senate 23
## 54 #ICYMI 1434 Senate 24
## 55 #DemDebate 1401 Senate 25
## 56 #MOSen 1400 Senate 26
## 57 #ProtectOurCare 1388 Senate 27
## 58 #Senate 1384 Senate 28
## 59 #99countymeetings 1363 Senate 29
## 60 #BlackHistoryMonth 1350 Senate 30
ggplot(hashtags_top_30_by_chamber, aes(x = count, y = reorder_within(hashtag, -rank, chamber))) +
geom_col(aes(fill = chamber)) +
geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
facet_wrap(~chamber, scales = "free") +
labs(title = "Top 30 Most Common Hashtags Used by Congress Members in Tweets, by Chamber, 2017-2023", x = "Number of Tweets", y = "Hashtag") +
scale_fill_manual(values = c(HOUSE_HEX, SENATE_HEX)) +
scale_x_continuous(labels = scales::comma) +
scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
theme_minimal() +
theme(legend.position = "none")
Next, we will analyze the most common words used by Congress members in their tweets.
# >>> ACTION REQUIRED: Please uncomment the following line and remove the workaround below
# tokens_top_30 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets) AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")
# Workaround for a bug in process_text.py. I've fixed it so you won't need this anymore. You can safely remove this workaround, but you need to uncomment the code above.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
tokens_top_30 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets) AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(30)
# End of workaround
tokens_top_30$rank <- rank(-tokens_top_30$count)
tokens_top_30
## word count rank
## 1 people 344460 1
## 2 trump 295509 2
## 3 act 286611 3
## 4 president 284337 4
## 5 american 261957 5
## 6 house 258124 6
## 7 time 248561 7
## 8 health 242520 8
## 9 bill 238280 9
## 10 congress 235081 10
## 11 americans 234255 11
## 12 support 232138 12
## 13 day 223232 13
## 14 biden 218421 14
## 15 families 213354 15
## 16 country 208284 16
## 17 care 193531 17
## 18 america 178198 18
## 19 proud 177988 19
## 20 senate 173809 20
## 21 vote 171776 21
## 22 women 159371 22
## 23 community 155184 23
## 24 federal 145078 24
## 25 democrats 145043 25
## 26 week 143063 26
## 27 protect 142032 27
## 28 communities 140943 28
## 29 nation 139927 29
## 30 continue 138902 30
ggplot(tokens_top_30, aes(x = count, y = reorder(word, count))) +
geom_col() +
geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
labs(title = "Top 30 Most Common Words Used by Congress Members in Tweets, 2017-2023", x = "Number of Occurrences", y = "Word") +
scale_x_continuous(labels = scales::comma) +
theme_minimal()
The most common words used by Congress members in their tweets are “people”, “trump”, “act”, “president”, “american”, and etc. These words are general and do not provide much insight into the content of the tweets. However, we can see that the word “trump” is one of the most common, which suggests that the former president Donald Trump was a major topic of discussion among Congress members during his time in office.
Let’s also analyze with respect to party.
# >>> ACTION REQUIRED: Please uncomment the following lines and remove the workaround below
# tokens_top_30_d <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D') AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")
# tokens_top_30_r <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R') AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")
# Workaround for a bug in process_text.py. I've fixed it so you won't need this anymore. You can safely remove this workaround, but you need to uncomment the code above.
tokens_top_30_d <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(30)
tokens_top_30_r <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(30)
# End of workaround
tokens_top_30_d$party <- "Democratic Party"
tokens_top_30_d$rank <- rank(-tokens_top_30_d$count)
tokens_top_30_r$party <- "Republican Party"
tokens_top_30_r$rank <- rank(-tokens_top_30_r$count)
tokens_top_30_by_party <- rbind(tokens_top_30_d, tokens_top_30_r)
tokens_top_30_by_party
## word count party rank
## 1 trump 236189 Democratic Party 1
## 2 people 225227 Democratic Party 2
## 3 act 200323 Democratic Party 3
## 4 health 194126 Democratic Party 4
## 5 president 165742 Democratic Party 5
## 6 house 156304 Democratic Party 6
## 7 time 155262 Democratic Party 7
## 8 congress 153866 Democratic Party 8
## 9 families 152459 Democratic Party 9
## 10 care 152418 Democratic Party 10
## 11 support 142371 Democratic Party 11
## 12 bill 142153 Democratic Party 12
## 13 americans 140463 Democratic Party 13
## 14 american 139352 Democratic Party 14
## 15 day 134343 Democratic Party 15
## 16 country 131589 Democratic Party 16
## 17 vote 119440 Democratic Party 17
## 18 proud 118659 Democratic Party 18
## 19 women 115740 Democratic Party 19
## 20 community 112359 Democratic Party 20
## 21 senate 110183 Democratic Party 21
## 22 communities 102637 Democratic Party 22
## 23 protect 99892 Democratic Party 23
## 24 fight 94356 Democratic Party 24
## 25 federal 93832 Democratic Party 25
## 26 join 90213 Democratic Party 26
## 27 america 88892 Democratic Party 27
## 28 public 86985 Democratic Party 28
## 29 rights 86401 Democratic Party 29
## 30 continue 86179 Democratic Party 30
## 31 biden 170937 Republican Party 1
## 32 american 120359 Republican Party 2
## 33 president 116314 Republican Party 3
## 34 people 113273 Republican Party 4
## 35 house 100287 Republican Party 5
## 36 border 98775 Republican Party 6
## 37 bill 93545 Republican Party 7
## 38 americans 91526 Republican Party 8
## 39 democrats 90945 Republican Party 9
## 40 time 90559 Republican Party 10
## 41 america 87434 Republican Party 11
## 42 support 87298 Republican Party 12
## 43 day 86907 Republican Party 13
## 44 act 84121 Republican Party 14
## 45 congress 78953 Republican Party 15
## 46 country 73466 Republican Party 16
## 47 nation 62459 Republican Party 17
## 48 national 61885 Republican Party 18
## 49 senate 61778 Republican Party 19
## 50 week 60275 Republican Party 20
## 51 families 59122 Republican Party 21
## 52 proud 58101 Republican Party 22
## 53 trump 55816 Republican Party 23
## 54 tax 55486 Republican Party 24
## 55 law 51769 Republican Party 25
## 56 continue 51726 Republican Party 26
## 57 energy 51666 Republican Party 27
## 58 crisis 51022 Republican Party 28
## 59 vote 50284 Republican Party 29
## 60 federal 49879 Republican Party 30
ggplot(tokens_top_30_by_party, aes(x = count, y = reorder_within(word, count, party))) +
geom_col(aes(fill = party)) +
geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
facet_wrap(~party, scales = "free") +
labs(title = "Top 30 Most Common Words Used by Congress Members in Tweets, by Party, 2017-2023", x = "Number of Occurrences", y = "Word") +
scale_fill_manual(values = c(D_HEX, R_HEX)) +
scale_x_continuous(labels = scales::comma) +
scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
theme_minimal() +
theme(legend.position = "none")
The most common words used by Democratic members in their tweets are “trump”, “people”, “act”, “health”, “president”, and etc. Notably ones down the list are “women”, “community/communities”, “protect”, “fight”, “public”, and “rights”. These words suggest that Democratic members discussed majorly about the former president Donald Trump, as well as current issues such as healthcare and rights of women and other communities.
On the other hand, the most common words used by Republican members in their tweets are “biden”, “american”, “president”, “people”, “house”, and etc. Again, noteworthy ones down the list are “border”, “democrats”, “nation”, “national”, “trump”, “tax”, “law”, “energy”, and “crisis”. These words suggest that Republican members discussed majorly about the current president Joe Biden, as well as current issues such as border security, energy policy, and tax law. Moreover, the word “trump” is still present in the list, indicating that the former president Donald Trump was still a major topic of discussion among Republican members.
It is surprising that gun control, climate change, and other major current issues which are present in the list of top hashtags, are not present in the list of top words. This suggests that Congress members are relying more on hashtags to convey their stance on these issues, rather than addressing them directly in their tweets.
Let’s also analyze with respect to party and year.
# >>> ACTION REQUIRED: Please uncomment the following lines and remove the workaround below
# tokens_top_10_d_2018 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2018-01-01' AND time < '2019-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2018 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2018-01-01' AND time < '2019-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_d_2019 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2019-01-01' AND time < '2020-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2019 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2019-01-01' AND time < '2020-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_d_2020 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2020-01-01' AND time < '2021-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2020 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2020-01-01' AND time < '2021-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_d_2021 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2021-01-01' AND time < '2022-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2021 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2021-01-01' AND time < '2022-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_d_2022 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2022-01-01' AND time < '2023-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2022 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2022-01-01' AND time < '2023-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# Workaround for a bug in process_text.py. I've fixed it so you won't need this anymore. You can safely remove this workaround, but you need to uncomment the code above.
tokens_top_10_d_2018 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2018-01-01' AND time < '2019-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_r_2018 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2018-01-01' AND time < '2019-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_d_2019 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2019-01-01' AND time < '2020-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_r_2019 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2019-01-01' AND time < '2020-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_d_2020 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2020-01-01' AND time < '2021-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_r_2020 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2020-01-01' AND time < '2021-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_d_2021 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2021-01-01' AND time < '2022-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_r_2021 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2021-01-01' AND time < '2022-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_d_2022 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2022-01-01' AND time < '2023-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
tokens_top_10_r_2022 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2022-01-01' AND time < '2023-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(10)
# End of workaround
tokens_top_10_d_2018$year <- 2018
tokens_top_10_r_2018$year <- 2018
tokens_top_10_d_2019$year <- 2019
tokens_top_10_r_2019$year <- 2019
tokens_top_10_d_2020$year <- 2020
tokens_top_10_r_2020$year <- 2020
tokens_top_10_d_2021$year <- 2021
tokens_top_10_r_2021$year <- 2021
tokens_top_10_d_2022$year <- 2022
tokens_top_10_r_2022$year <- 2022
tokens_top_10_d_2018$party <- "D"
tokens_top_10_r_2018$party <- "R"
tokens_top_10_d_2019$party <- "D"
tokens_top_10_r_2019$party <- "R"
tokens_top_10_d_2020$party <- "D"
tokens_top_10_r_2020$party <- "R"
tokens_top_10_d_2021$party <- "D"
tokens_top_10_r_2021$party <- "R"
tokens_top_10_d_2022$party <- "D"
tokens_top_10_r_2022$party <- "R"
tokens_top_10_by_party_and_year <- rbind(tokens_top_10_d_2018, tokens_top_10_r_2018, tokens_top_10_d_2019, tokens_top_10_r_2019, tokens_top_10_d_2020, tokens_top_10_r_2020, tokens_top_10_d_2021, tokens_top_10_r_2021, tokens_top_10_d_2022, tokens_top_10_r_2022)
tokens_top_10_by_party_and_year
## word count year party
## 1 trump 50562 2018 D
## 2 people 29776 2018 D
## 3 president 25517 2018 D
## 4 congress 23849 2018 D
## 5 families 21748 2018 D
## 6 health 21598 2018 D
## 7 time 21116 2018 D
## 8 vote 19832 2018 D
## 9 country 19552 2018 D
## 10 house 19496 2018 D
## 11 support 14552 2018 R
## 12 bill 14199 2018 R
## 13 day 13786 2018 R
## 14 house 13264 2018 R
## 15 people 13036 2018 R
## 16 tax 12919 2018 R
## 17 time 12722 2018 R
## 18 president 12522 2018 R
## 19 senate 12365 2018 R
## 20 american 11575 2018 R
## 21 trump 64391 2019 D
## 22 president 46566 2019 D
## 23 people 45418 2019 D
## 24 house 38494 2019 D
## 25 act 36065 2019 D
## 26 congress 33137 2019 D
## 27 time 30891 2019 D
## 28 health 29954 2019 D
## 29 country 27052 2019 D
## 30 american 26558 2019 D
## 31 president 18596 2019 R
## 32 people 17884 2019 R
## 33 democrats 17841 2019 R
## 34 house 17254 2019 R
## 35 american 15654 2019 R
## 36 time 15470 2019 R
## 37 border 14800 2019 R
## 38 realdonaldtrump 14176 2019 R
## 39 congress 13622 2019 R
## 40 support 13298 2019 R
## 41 trump 72107 2020 D
## 42 health 55868 2020 D
## 43 people 52782 2020 D
## 44 president 43116 2020 D
## 45 act 41588 2020 D
## 46 americans 37391 2020 D
## 47 vote 36386 2020 D
## 48 care 34844 2020 D
## 49 covid19 34606 2020 D
## 50 time 33921 2020 D
## 51 realdonaldtrump 24247 2020 R
## 52 people 23287 2020 R
## 53 president 22920 2020 R
## 54 american 22684 2020 R
## 55 coronavirus 20670 2020 R
## 56 support 19399 2020 R
## 57 time 18487 2020 R
## 58 america 17697 2020 R
## 59 businesses 16965 2020 R
## 60 day 16766 2020 R
## 61 act 44490 2021 D
## 62 people 38842 2021 D
## 63 families 32646 2021 D
## 64 american 30485 2021 D
## 65 health 30050 2021 D
## 66 time 28616 2021 D
## 67 bill 26405 2021 D
## 68 care 25938 2021 D
## 69 support 25638 2021 D
## 70 house 24792 2021 D
## 71 biden 61150 2021 R
## 72 border 29875 2021 R
## 73 president 25348 2021 R
## 74 democrats 25256 2021 R
## 75 american 24835 2021 R
## 76 people 23791 2021 R
## 77 americans 22233 2021 R
## 78 bill 21846 2021 R
## 79 america 18654 2021 R
## 80 house 17748 2021 R
## 81 act 42479 2022 D
## 82 people 33985 2022 D
## 83 health 31015 2022 D
## 84 support 28402 2022 D
## 85 care 25819 2022 D
## 86 families 25526 2022 D
## 87 day 25467 2022 D
## 88 congress 24863 2022 D
## 89 time 24587 2022 D
## 90 proud 23175 2022 D
## 91 biden 68514 2022 R
## 92 border 31787 2022 R
## 93 american 28227 2022 R
## 94 inflation 23335 2022 R
## 95 energy 23160 2022 R
## 96 president 22837 2022 R
## 97 americans 22654 2022 R
## 98 people 21468 2022 R
## 99 democrats 20855 2022 R
## 100 america 19406 2022 R
ggplot(tokens_top_10_by_party_and_year, aes(x = count, y = reorder_within(word, count, interaction(year, party)), fill = party)) +
geom_col() +
facet_grid2(party ~ year, scales = "free", independent = "y", labeller = as_labeller(c("D" = "Democratic Party", "R" = "Republican Party", "2018" = "2018", "2019" = "2019", "2020" = "2020", "2021" = "2021", "2022" = "2022"))) +
labs(title = "Top 10 Most Common Words Used by Congress Members in Tweets, by Party and Year, 2018-2022", x = "Number of Occurrences", y = "Word", fill = "Party") +
scale_fill_manual(values = c(D_HEX, R_HEX)) +
scale_x_continuous(labels = scales::comma) +
scale_y_reordered(labels = function(x) gsub("___\\d{4}\\.(D|R)", "", x)) +
theme_minimal() +
theme(legend.position = "none")
“Trump” was the top word for Democratic members during the Trump administration, while “Biden” was the top word for Republican members during the Biden administration, suggesting the opposing party like to mention the ruling president in their tweets.
Democratic members’ usage of words have been consistent over the years, focusing on healthcare. On the other hand, Republican members’ usage of words have evolved, likely reacting to developments in various issues. For example, the popularity of the word “border” went from outside the top 10 in 2018 to the top 2 in 2022. There are also introduction of new words in the top 10 since the Biden administration, such as “inflation” and “energy”, suggesting that Republican members are criticizing the Biden administration’s policies on these issues.
Let’s also analyze with respect to chamber.
# >>> ACTION REQUIRED: Please uncomment the following lines and remove the workaround below
# tokens_top_30_h <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'house') AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")
# tokens_top_30_s <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'senate') AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")
# Workaround for a bug in process_text.py. I've fixed it so you won't need this anymore. You can safely remove this workaround, but you need to uncomment the code above.
tokens_top_30_h <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'house') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(30)
tokens_top_30_s <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'senate') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
anti_join(stop_words, by = "word") |>
head(30)
# End of workaround
tokens_top_30_h$chamber <- "House of Representatives"
tokens_top_30_h$rank <- rank(-tokens_top_30_h$count)
tokens_top_30_s$chamber <- "Senate"
tokens_top_30_s$rank <- rank(-tokens_top_30_s$count)
tokens_top_30_by_chamber <- rbind(tokens_top_30_h, tokens_top_30_s)
tokens_top_30_by_chamber
## word count chamber rank
## 1 people 263837 House of Representatives 1
## 2 trump 227818 House of Representatives 2
## 3 house 227805 House of Representatives 3
## 4 act 216053 House of Representatives 4
## 5 president 215188 House of Representatives 5
## 6 american 201051 House of Representatives 6
## 7 congress 194269 House of Representatives 7
## 8 time 190065 House of Representatives 8
## 9 americans 177940 House of Representatives 9
## 10 day 173473 House of Representatives 10
## 11 health 171928 House of Representatives 11
## 12 support 171891 House of Representatives 12
## 13 bill 170712 House of Representatives 13
## 14 biden 165887 House of Representatives 14
## 15 families 158523 House of Representatives 15
## 16 country 153382 House of Representatives 16
## 17 proud 139764 House of Representatives 17
## 18 america 138306 House of Representatives 18
## 19 care 132731 House of Representatives 19
## 20 vote 130353 House of Representatives 20
## 21 community 129029 House of Representatives 21
## 22 women 120854 House of Representatives 22
## 23 week 111774 House of Representatives 23
## 24 democrats 111676 House of Representatives 24
## 25 join 107640 House of Representatives 25
## 26 federal 107106 House of Representatives 26
## 27 communities 105625 House of Representatives 27
## 28 nation 105092 House of Representatives 28
## 29 continue 104495 House of Representatives 29
## 30 protect 103885 House of Representatives 30
## 31 senate 90259 Senate 1
## 32 people 80623 Senate 2
## 33 health 70592 Senate 3
## 34 act 70558 Senate 4
## 35 president 69149 Senate 5
## 36 trump 67691 Senate 6
## 37 bill 67568 Senate 7
## 38 american 60906 Senate 8
## 39 care 60800 Senate 9
## 40 support 60247 Senate 10
## 41 time 58496 Senate 11
## 42 americans 56315 Senate 12
## 43 country 54902 Senate 13
## 44 families 54831 Senate 14
## 45 biden 52534 Senate 15
## 46 day 49759 Senate 16
## 47 vote 41423 Senate 17
## 48 congress 40812 Senate 18
## 49 america 39892 Senate 19
## 50 bipartisan 39294 Senate 20
## 51 women 38517 Senate 21
## 52 proud 38224 Senate 22
## 53 protect 38147 Senate 23
## 54 federal 37972 Senate 24
## 55 law 37206 Senate 25
## 56 national 37094 Senate 26
## 57 communities 35318 Senate 27
## 58 fight 35157 Senate 28
## 59 nation 34835 Senate 29
## 60 continue 34407 Senate 30
ggplot(tokens_top_30_by_chamber, aes(x = count, y = reorder_within(word, count, chamber))) +
geom_col(aes(fill = chamber)) +
geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
facet_wrap(~chamber, scales = "free") +
labs(title = "Top 30 Most Common Words Used by Congress Members in Tweets, by Chamber, 2017-2023", x = "Number of Occurrences", y = "Word") +
scale_fill_manual(values = c(HOUSE_HEX, SENATE_HEX)) +
scale_x_continuous(labels = scales::comma) +
scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
theme_minimal() +
theme(legend.position = "none")
The most common words are mostly the same with the previous analysis that is by party. However, one particular word that stands out is “bipartisan” in the list of top words used by Senate members. This suggests that Senate members are more likely to discuss bipartisan issues and cooperation in their tweets compared to House members.
Let’s proceed to the analysis of tweet content. As mentioned, text mining is completed in Python beforehand. We can now retrieve the result from the database.
First, we will analyze the sentiment of tweets by party affiliation of the author.
sentiment_by_party <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id GROUP BY party")
sentiment_by_party
## party avg
## 1 D 0.2252987
## 2 R 0.2144899
## 3 <NA> 0.2358065
ggplot(sentiment_by_party, aes(x = party, y = avg)) +
geom_col(fill = c(D_HEX, R_HEX, NA_HEX)) +
labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Party, 2017-2023", x = "Party", y = "Average Sentiment Score") +
scale_x_discrete(labels = c("Democratic", "Republican", "Independent")) +
scale_y_continuous(labels = scales::comma) +
theme_minimal()
The order of the average sentiment scores from highest to lowest is independent, Democratic, and Republican. This indicates that independent members have the most positive sentiment in their tweets, followed by Democratic members, and then Republican members. However, the difference in sentiment scores between the three parties is not significant, so we can only conclude that the sentiment of tweets of Congress members is generally positive.
Let’s see if the sentiment of tweets has changed over time.
sentiment_by_party_2018 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY party")
sentiment_by_party_2019 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY party")
sentiment_by_party_2020 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY party")
sentiment_by_party_2021 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY party")
sentiment_by_party_2022 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY party")
sentiment_by_party_2018$year <- 2018
sentiment_by_party_2019$year <- 2019
sentiment_by_party_2020$year <- 2020
sentiment_by_party_2021$year <- 2021
sentiment_by_party_2022$year <- 2022
sentiment_by_party_ts <- rbind(sentiment_by_party_2018, sentiment_by_party_2019, sentiment_by_party_2020, sentiment_by_party_2021, sentiment_by_party_2022)
sentiment_by_party_ts
## party avg year
## 1 D 0.1944888 2018
## 2 R 0.3320895 2018
## 3 <NA> 0.2534372 2018
## 4 D 0.1939735 2019
## 5 R 0.2628996 2019
## 6 <NA> 0.1848596 2019
## 7 D 0.2036529 2020
## 8 R 0.2734945 2020
## 9 <NA> 0.1799959 2020
## 10 D 0.2799707 2021
## 11 R 0.1292012 2021
## 12 <NA> 0.3166578 2021
## 13 D 0.2901276 2022
## 14 R 0.1199303 2022
## 15 <NA> 0.3831453 2022
ggplot(sentiment_by_party_ts, aes(x = year, y = avg, fill = party)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Party and Year, 2018-2022", x = "Year", y = "Average Sentiment Score", fill = "Party") +
scale_fill_manual(values = c(D_HEX, R_HEX, NA_HEX), labels = c("Democratic", "Republican", "Independent")) +
scale_y_continuous(labels = scales::comma) +
theme_minimal()
While the sentiment of tweets of Democratic members has been increasing from 2018 to 2022, the converse is true for Republican members. This suggests that Democratic members have become more positive in their tweets over time, while Republican members have become less positive. Surprisingly, the sentiment of tweets of independent members turned out to be the most positive in 2022 after reaching a low in 2020. But this could be due to the small sample size of independent members.
Now, we will run the same analysis but by chamber of the author.
sentiment_by_chamber <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id GROUP BY chamber")
sentiment_by_chamber
## chamber avg
## 1 house 0.2178552
## 2 senate 0.2306518
ggplot(sentiment_by_chamber, aes(x = chamber, y = avg)) +
geom_col(fill = c(HOUSE_HEX, SENATE_HEX)) +
labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Chamber, 2017-2023", x = "Chamber", y = "Average Sentiment Score") +
scale_x_discrete(labels = c("House", "Senate")) +
scale_y_continuous(labels = scales::comma) +
theme_minimal()
sentiment_by_chamber_2018 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY chamber")
sentiment_by_chamber_2019 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY chamber")
sentiment_by_chamber_2020 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY chamber")
sentiment_by_chamber_2021 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY chamber")
sentiment_by_chamber_2022 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY chamber")
sentiment_by_chamber_2018$year <- 2018
sentiment_by_chamber_2019$year <- 2019
sentiment_by_chamber_2020$year <- 2020
sentiment_by_chamber_2021$year <- 2021
sentiment_by_chamber_2022$year <- 2022
sentiment_by_chamber_ts <- rbind(sentiment_by_chamber_2018, sentiment_by_chamber_2019, sentiment_by_chamber_2020, sentiment_by_chamber_2021, sentiment_by_chamber_2022)
sentiment_by_chamber_ts
## chamber avg year
## 1 house 0.2515259 2018
## 2 senate 0.2530553 2018
## 3 house 0.2151629 2019
## 4 senate 0.2281195 2019
## 5 house 0.2220529 2020
## 6 senate 0.2528610 2020
## 7 house 0.2138896 2021
## 8 senate 0.2167865 2021
## 9 house 0.2123538 2022
## 10 senate 0.2219200 2022
ggplot(sentiment_by_chamber_ts, aes(x = year, y = avg, fill = chamber)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Chamber and Year, 2018-2022", x = "Year", y = "Average Sentiment Score", fill = "Chamber") +
scale_fill_manual(values = c(HOUSE_HEX, SENATE_HEX), labels = c("House", "Senate")) +
scale_y_continuous(labels = scales::comma) +
theme_minimal()
Senate members have a slightly higher average sentiment score than Senate members in each year. Apart from that, the sentiment of tweets of both chambers has been relatively stable from 2018 to 2022, with a slight decrease in 2022 for both chambers, and a slight increase in 2020 for the Senate. This suggests that the sentiment of tweets of Congress members has been generally positive.
ML topic inference is something that is not covered in this project. I will improve this part in the future. However, we can still analyze the sentiment of tweets by party and topic through tokens extracted from text mining.
sentiment_by_topic <- dbGetQuery(con, "SELECT party, topic, AVG(sentiment_score) FROM (SELECT unnest(tokens) AS topic, sentiment_score, party FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party IS NOT NULL) AS topics WHERE topic LIKE 'vaccine' OR topic LIKE 'lockdown' OR topic LIKE 'climate' OR topic LIKE 'abortion' OR topic LIKE 'gun' OR topic LIKE 'lgbtq' OR topic LIKE 'blm' OR topic LIKE 'gender' OR topic LIKE 'diversity' OR topic LIKE 'immigration' OR topic LIKE 'biden' OR topic LIKE 'trump' GROUP BY party, topic ORDER BY avg DESC")
sentiment_by_topic
## party topic avg
## 1 D diversity 0.537440683
## 2 D vaccine 0.370930798
## 3 D lgbtq 0.279358000
## 4 D gender 0.256078122
## 5 R diversity 0.249240119
## 6 D biden 0.246496491
## 7 R lgbtq 0.237023246
## 8 R vaccine 0.182258747
## 9 D abortion 0.166727510
## 10 R climate 0.150572902
## 11 D climate 0.109176012
## 12 R trump 0.105979699
## 13 R abortion 0.103581481
## 14 D immigration 0.080924406
## 15 D blm 0.010409462
## 16 R gender 0.005042393
## 17 D trump -0.103745331
## 18 R immigration -0.104774530
## 19 D lockdown -0.107225054
## 20 R lockdown -0.129531098
## 21 R biden -0.165162030
## 22 R blm -0.173421729
## 23 R gun -0.319348096
## 24 D gun -0.480987827
ggplot(sentiment_by_topic, aes(x = topic, y = avg, fill = party)) +
geom_col(position = "dodge") +
labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Topic and Party, 2017-2023", x = "Topic", y = "Average Sentiment Score", fill = "Party", subtitle = "Note: Indepedent members are excluded due to small sample size") +
scale_fill_manual(values = c(D_HEX, R_HEX, NA_HEX), labels = c("Democratic", "Republican")) +
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(legend.position = "none")
From the plot, we can see how Democratic and Republican members have spoken about different current issues. Democratic members have a more positive sentiment when talking about topics such as “vaccine”, “abortion”, “diversity”, “immigration”, and “LGBTQ”. On the other hand, Republican members have a more positive (or less negative) sentiment when talking about topics such as “Trump”, “gun”, and “climate”. It is interesting to see that Republican members have a more positive sentiment when talking about climate, which is usually a topic that is associated with Democratic members.
This concludes the sentiment analysis.
Now that we have finished analyzing text data, let’s also look at some axillary data. We will start by analyzing the most frequently retweeted accounts by Congress members.
accounts_retweeted_top_30 <- dbGetQuery(con, "SELECT retweeting, COUNT(*) FROM tweets WHERE retweeting IS NOT NULL GROUP BY retweeting ORDER BY count DESC LIMIT 30")
accounts_retweeted_top_30$rank <- rank(-accounts_retweeted_top_30$count)
accounts_retweeted_top_30
## retweeting count rank
## 1 HouseGOP 12645 1
## 2 realDonaldTrump 6434 2
## 3 GOPLeader 6191 3
## 4 BillPascrell 6130 4
## 5 POTUS 5438 5
## 6 WhiteHouse 5266 6
## 7 thehill 5200 7
## 8 FoxNews 4919 8
## 9 HouseDemocrats 4422 9
## 10 WaysandMeansGOP 4388 10
## 11 SenateGOP 4155 11
## 12 HispanicCaucus 3852 12
## 13 SpeakerPelosi 3801 13
## 14 SteveScalise 3745 14
## 15 JudiciaryGOP 3522 15
## 16 WaysMeansCmte 3403 16
## 17 HouseCommerce 3348 17
## 18 RepDonBeyer 3305 18
## 19 JoeBiden 3199 19
## 20 OversightDems 3082 20
## 21 HouseJudiciary 2951 21
## 22 GOPoversight 2950 22
## 23 RepCasten 2909 23
## 24 ChuckGrassley 2740 24
## 25 dcexaminer 2191 25
## 26 kylegriffin1 2168 26
## 27 RepAdamSchiff 2131 27
## 28 SenateDems 2084 28
## 29 RepTenney 2050 29
## 30 GOP 1977 30
ggplot(accounts_retweeted_top_30, aes(x = count, y = reorder(retweeting, count))) +
geom_col() +
geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
labs(title = "Top 30 Most Frequently Retweeted Accounts by Congress Members, 2017-2023", x = "Number of Retweets", y = "Account") +
scale_x_continuous(labels = scales::comma) +
theme_minimal()
The most common accounts retweeted by Congress members are “@HouseGOP,”@realDonaldTrump“,”@GOPLeader“,”@BillPascrell“,”@POTUS“, and etc. These accounts are mostly related to the Republican Party, with the exception of”@BillPascrell“, who is a Democratic member of the House of Representatives. This suggests that Republican Congress members are more likely to amplify the messages of their party and its leaders by retweeting their accounts, compared to Democratic members.
Let’s also analyze with respect to party.
accounts_retweeted_top_30_d <- dbGetQuery(con, "SELECT retweeting, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND retweeting IS NOT NULL GROUP BY retweeting ORDER BY count DESC LIMIT 30")
accounts_retweeted_top_30_r <- dbGetQuery(con, "SELECT retweeting, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND retweeting IS NOT NULL GROUP BY retweeting ORDER BY count DESC LIMIT 30")
accounts_retweeted_top_30_d$party <- "Democrat"
accounts_retweeted_top_30_d$rank <- rank(-accounts_retweeted_top_30_d$count)
accounts_retweeted_top_30_r$party <- "Republican"
accounts_retweeted_top_30_r$rank <- rank(-accounts_retweeted_top_30_r$count)
accounts_retweeted_top_30_by_party <- rbind(accounts_retweeted_top_30_d, accounts_retweeted_top_30_r)
accounts_retweeted_top_30_by_party
## retweeting count party rank
## 1 BillPascrell 6116 Democrat 1
## 2 POTUS 5413 Democrat 2
## 3 HouseDemocrats 4422 Democrat 3
## 4 HispanicCaucus 3851 Democrat 4
## 5 SpeakerPelosi 3797 Democrat 5
## 6 WaysMeansCmte 3399 Democrat 6
## 7 RepDonBeyer 3291 Democrat 7
## 8 JoeBiden 3193 Democrat 8
## 9 thehill 3073 Democrat 9
## 10 OversightDems 3061 Democrat 10
## 11 RepCasten 2908 Democrat 11
## 12 HouseJudiciary 2707 Democrat 12
## 13 kylegriffin1 2158 Democrat 13
## 14 RepAdamSchiff 2129 Democrat 14
## 15 WhiteHouse 2082 Democrat 15
## 16 SenateDems 2078 Democrat 16
## 17 NRDems 1965 Democrat 17
## 18 AppropsDems 1915 Democrat 18
## 19 USProgressives 1888 Democrat 19
## 20 TheDemocrats 1885 Democrat 20
## 21 TheBlackCaucus 1875 Democrat 21
## 22 VetAffairsDems 1795 Democrat 22
## 23 nowthisnews 1758 Democrat 23
## 24 RepCohen 1752 Democrat 24
## 25 TeamPelosi 1725 Democrat 25
## 26 HouseBudgetDems 1688 Democrat 26
## 27 RepEspaillat 1667 Democrat 27
## 28 KamalaHarris 1577 Democrat 28
## 29 HouseForeign 1563 Democrat 29
## 30 JulianCastro 1554 Democrat 30
## 31 HouseGOP 12631 Republican 1
## 32 realDonaldTrump 6387 Republican 2
## 33 GOPLeader 6184 Republican 3
## 34 FoxNews 4814 Republican 4
## 35 WaysandMeansGOP 4371 Republican 5
## 36 SenateGOP 4153 Republican 6
## 37 SteveScalise 3736 Republican 7
## 38 JudiciaryGOP 3521 Republican 8
## 39 HouseCommerce 3344 Republican 9
## 40 WhiteHouse 3180 Republican 10
## 41 GOPoversight 2949 Republican 11
## 42 ChuckGrassley 2718 Republican 12
## 43 dcexaminer 2126 Republican 13
## 44 thehill 2099 Republican 14
## 45 RepTenney 2049 Republican 15
## 46 GOP 1977 Republican 16
## 47 westerncaucus 1973 Republican 17
## 48 Jim_Jordan 1902 Republican 18
## 49 DonaldJTrumpJr 1795 Republican 19
## 50 RealJamesWoods 1712 Republican 20
## 51 RepMalliotakis 1660 Republican 21
## 52 newsmax 1619 Republican 22
## 53 GOPChairwoman 1589 Republican 23
## 54 senatemajldr 1569 Republican 24
## 55 SpeakerRyan 1526 Republican 25
## 56 HouseForeignGOP 1517 Republican 26
## 57 charliekirk11 1472 Republican 27
## 58 HomelandGOP 1467 Republican 28
## 59 RepAndyBiggsAZ 1400 Republican 29
## 60 EdLaborGOP 1379 Republican 30
ggplot(accounts_retweeted_top_30_by_party, aes(x = count, y = reorder_within(retweeting, count, party))) +
geom_col(aes(fill = party)) +
geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
facet_wrap(~party, scales = "free") +
labs(title = "Top 30 Most Frequently Retweeted Accounts by Congress Members, by Party, 2017-2023", x = "Number of Retweets", y = "Account") +
scale_fill_manual(values = c(D_HEX, R_HEX)) +
scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
theme_minimal() +
theme(legend.position = "none")
## Don't know how to automatically pick scale for object of type <integer64>.
## Defaulting to continuous.
The most common accounts retweeted by Democratic members are “@BillPascrell”, “@POTUS”, “@HouseDemocrats”, “@HispanicCaucus”, “@SpeakerPelosi”, and etc. Meanwhile, the most common accounts retweeted by Republican members are “@HouseGOP”, “@realDonaldTrump”, “@GOPLeader”, “@FoxNews”, “@WaysandMeansGOP”, and etc. We can see that both parties are retweeting accounts that are organizations or individuals affiliated with their party, such as the House Democrats and House GOP. However, Republican members are more likely to retweet conservative news outlets such as Fox News and Newsmax, and accounts affiliated with former president Donald Trump, such as Donald Trump himself and his son Donald Trump Jr. This suggests that Republican members are more enthusiastic about its leaders, as well as conservative media outlets. On the other hand, Democratic members draw more attention on different caucuses and communities.
Next, we will analyze the most frequently quoted accounts by Congress members.
accounts_quoted_top_30 <- dbGetQuery(con, "SELECT quoting, COUNT(*) FROM tweets WHERE quoting IS NOT NULL GROUP BY quoting ORDER BY count DESC LIMIT 30")
accounts_quoted_top_30$rank <- rank(-accounts_quoted_top_30$count)
accounts_quoted_top_30
## quoting count rank
## 1 realDonaldTrump 14955 1
## 2 nytimes 8348 2
## 3 thehill 8211 3
## 4 washingtonpost 6921 4
## 5 AP 5164 5
## 6 POTUS 5157 6
## 7 GOPLeader 3825 7
## 8 CNN 3814 8
## 9 NBCNews 3704 9
## 10 WhiteHouse 3548 10
## 11 HouseGOP 3536 11
## 12 politico 3376 12
## 13 kylegriffin1 2973 13
## 14 FoxNews 2872 14
## 15 ABC 2763 15
## 16 NPR 2441 16
## 17 SteveScalise 2400 17
## 18 BillFOXLA 2076 18
## 19 CBSNews 2067 19
## 20 atrupar 2027 20
## 21 nowthisnews 2012 21
## 22 chiproytx 1978 22
## 23 axios 1931 23
## 24 WSJ 1853 24
## 25 CNNPolitics 1816 25
## 26 RNCResearch 1788 26
## 27 JoeBiden 1603 27
## 28 MSNBC 1585 28
## 29 Reuters 1566 29
## 30 mkraju 1555 30
ggplot(accounts_quoted_top_30, aes(x = count, y = reorder(quoting, count))) +
geom_col() +
geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
labs(title = "Top 30 Most Frequently Quoted Accounts by Congress Members, 2017-2023", x = "Number of Quote Tweets", y = "Account") +
scale_x_continuous(labels = scales::comma) +
theme_minimal()
Let’s also analyze with respect to party.
accounts_quoted_top_30_d <- dbGetQuery(con, "SELECT quoting, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND quoting IS NOT NULL GROUP BY quoting ORDER BY count DESC LIMIT 30")
accounts_quoted_top_30_r <- dbGetQuery(con, "SELECT quoting, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND quoting IS NOT NULL GROUP BY quoting ORDER BY count DESC LIMIT 30")
accounts_quoted_top_30_d$party <- "Democrat"
accounts_quoted_top_30_d$rank <- rank(-accounts_quoted_top_30_d$count)
accounts_quoted_top_30_r$party <- "Republican"
accounts_quoted_top_30_r$rank <- rank(-accounts_quoted_top_30_r$count)
accounts_quoted_top_30_by_party <- rbind(accounts_quoted_top_30_d, accounts_quoted_top_30_r)
accounts_quoted_top_30_by_party
## quoting count party rank
## 1 realDonaldTrump 11159 Democrat 1
## 2 nytimes 7511 Democrat 2
## 3 washingtonpost 6130 Democrat 3
## 4 thehill 5742 Democrat 4
## 5 POTUS 3875 Democrat 5
## 6 AP 3779 Democrat 6
## 7 CNN 3283 Democrat 7
## 8 NBCNews 3209 Democrat 8
## 9 kylegriffin1 2897 Democrat 9
## 10 politico 2610 Democrat 10
## 11 NPR 2284 Democrat 11
## 12 ABC 2106 Democrat 12
## 13 nowthisnews 1958 Democrat 13
## 14 atrupar 1928 Democrat 14
## 15 CBSNews 1590 Democrat 15
## 16 RepDonBeyer 1519 Democrat 16
## 17 CNNPolitics 1518 Democrat 17
## 18 RepJayapal 1499 Democrat 18
## 19 WhiteHouse 1493 Democrat 19
## 20 MSNBC 1479 Democrat 20
## 21 RepValDemings 1295 Democrat 21
## 22 RepDwightEvans 1253 Democrat 22
## 23 SpeakerPelosi 1243 Democrat 23
## 24 mkraju 1219 Democrat 24
## 25 JoeBiden 1197 Democrat 25
## 26 axios 1177 Democrat 26
## 27 ChrisMurphyCT 1163 Democrat 27
## 28 brianschatz 1080 Democrat 28
## 29 cspan 1037 Democrat 29
## 30 latimes 1011 Democrat 30
## 31 realDonaldTrump 3598 Republican 1
## 32 HouseGOP 3416 Republican 2
## 33 GOPLeader 3290 Republican 3
## 34 FoxNews 2660 Republican 4
## 35 thehill 2360 Republican 5
## 36 SteveScalise 2291 Republican 6
## 37 BillFOXLA 2067 Republican 7
## 38 WhiteHouse 2050 Republican 8
## 39 chiproytx 1951 Republican 9
## 40 RNCResearch 1758 Republican 10
## 41 DailyCaller 1397 Republican 11
## 42 AP 1307 Republican 12
## 43 POTUS 1275 Republican 13
## 44 RepThomasMassie 1245 Republican 14
## 45 JohnCornyn 1229 Republican 15
## 46 WSJ 1214 Republican 16
## 47 nypost 1138 Republican 17
## 48 dcexaminer 1127 Republican 18
## 49 HouseCommerce 1082 Republican 19
## 50 RepChipRoy 1056 Republican 20
## 51 SenateGOP 1025 Republican 21
## 52 townhallcom 925 Republican 22
## 53 CNBC 861 Republican 23
## 54 GOPoversight 806 Republican 24
## 55 RepMattGaetz 764 Republican 25
## 56 axios 730 Republican 26
## 57 nytimes 728 Republican 27
## 58 politico 720 Republican 28
## 59 JudiciaryGOP 712 Republican 29
## 60 WaysandMeansGOP 710 Republican 30
ggplot(accounts_quoted_top_30_by_party, aes(x = count, y = reorder_within(quoting, count, party))) +
geom_col(aes(fill = party)) +
geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
facet_wrap(~party, scales = "free") +
labs(title = "Top 30 Most Frequently Quoted Accounts by Congress Members, by Party, 2017-2023", x = "Number of Quote Tweets", y = "Account") +
scale_fill_manual(values = c(D_HEX, R_HEX)) +
scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
theme_minimal() +
theme(legend.position = "none")
## Don't know how to automatically pick scale for object of type <integer64>.
## Defaulting to continuous.
Lastly, we will analyze the most common accounts mentioned by Congress members in tweets.
accounts_mentioned_top_30 <- dbGetQuery(con, "SELECT mention, COUNT(*) FROM (SELECT UNNEST(mentions) AS mention FROM tweets) AS mentions GROUP BY mention ORDER BY count DESC LIMIT 30")
accounts_mentioned_top_30$rank <- rank(-accounts_mentioned_top_30$count)
accounts_mentioned_top_30
## mention count rank
## 1 realDonaldTrump 81926 1
## 2 POTUS 81294 2
## 3 HouseDemocrats 23631 3
## 4 JoeBiden 18827 4
## 5 HouseGOP 18133 5
## 6 SpeakerPelosi 14533 6
## 7 FoxNews 11870 7
## 8 RepJayapal 10613 8
## 9 senatemajldr 9109 9
## 10 ChuckGrassley 8671 10
## 11 chiproytx 8164 11
## 12 VP 7993 12
## 13 GOPLeader 7772 13
## 14 HouseJudiciary 7436 14
## 15 AOC 7254 15
## 16 JohnCornyn 7023 16
## 17 SenSchumer 7005 17
## 18 WhiteHouse 6993 18
## 19 EPA 6838 19
## 20 SenateGOP 6780 20
## 21 GOP 6637 21
## 22 USDA 6334 22
## 23 timburchett 6182 23
## 24 KamalaHarris 6143 24
## 25 CDCgov 6009 25
## 26 MSNBC 5936 26
## 27 FoxBusiness 5756 27
## 28 SBAgov 5633 28
## 29 SteveScalise 5555 29
## 30 JoaquinCastrotx 5468 30
ggplot(accounts_mentioned_top_30, aes(x = count, y = reorder(mention, count))) +
geom_col() +
geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
labs(title = "Top 30 Most Frequently Mentioned Accounts by Congress Members in Tweets, 2017-2023", x = "Number of Mentions", y = "Account") +
scale_x_continuous(labels = scales::comma) +
theme_minimal()
Let’s also analyze with respect to party.
accounts_mentioned_top_30_d <- dbGetQuery(con, "SELECT mention, COUNT(*) FROM (SELECT UNNEST(mentions) AS mention FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D') AS mentions GROUP BY mention ORDER BY count DESC LIMIT 30")
accounts_mentioned_top_30_r <- dbGetQuery(con, "SELECT mention, COUNT(*) FROM (SELECT UNNEST(mentions) AS mention FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R') AS mentions GROUP BY mention ORDER BY count DESC LIMIT 30")
accounts_mentioned_top_30_d$party <- "Democrat"
accounts_mentioned_top_30_d$rank <- rank(-accounts_mentioned_top_30_d$count)
accounts_mentioned_top_30_r$party <- "Republican"
accounts_mentioned_top_30_r$rank <- rank(-accounts_mentioned_top_30_r$count)
accounts_mentioned_top_30_by_party <- rbind(accounts_mentioned_top_30_d, accounts_mentioned_top_30_r)
accounts_mentioned_top_30_by_party
## mention count party rank
## 1 POTUS 47214 Democrat 1.0
## 2 realDonaldTrump 36678 Democrat 2.0
## 3 HouseDemocrats 21018 Democrat 3.0
## 4 JoeBiden 12067 Democrat 4.0
## 5 RepJayapal 10436 Democrat 5.0
## 6 HouseGOP 7409 Democrat 6.0
## 7 SpeakerPelosi 6799 Democrat 7.0
## 8 HouseJudiciary 6675 Democrat 8.0
## 9 senatemajldr 6648 Democrat 9.0
## 10 AOC 5905 Democrat 10.0
## 11 KamalaHarris 5495 Democrat 11.0
## 12 RepPressley 5358 Democrat 12.0
## 13 JoaquinCastrotx 5330 Democrat 13.0
## 14 AyannaPressley 5326 Democrat 14.0
## 15 MSNBC 5311 Democrat 15.0
## 16 EPA 5241 Democrat 16.0
## 17 SenSchumer 5216 Democrat 17.0
## 18 RepRaskin 5152 Democrat 18.0
## 19 GOP 5110 Democrat 19.0
## 20 SenateGOP 4943 Democrat 20.0
## 21 CoryBooker 4925 Democrat 21.0
## 22 AppropsDems 4790 Democrat 22.0
## 23 ChrisMurphyCT 4600 Democrat 23.0
## 24 USPS 4520 Democrat 24.0
## 25 RepEspaillat 4466 Democrat 25.0
## 26 RepMaloney 4436 Democrat 26.0
## 27 HispanicCaucus 4188 Democrat 27.0
## 28 SenWarren 4172 Democrat 28.0
## 29 IlhanMN 4163 Democrat 29.0
## 30 JulianCastro 4161 Democrat 30.0
## 31 realDonaldTrump 45031 Republican 1.0
## 32 POTUS 33896 Republican 2.0
## 33 FoxNews 10754 Republican 3.0
## 34 HouseGOP 10709 Republican 4.0
## 35 chiproytx 8061 Republican 5.0
## 36 ChuckGrassley 7806 Republican 6.0
## 37 SpeakerPelosi 7689 Republican 7.0
## 38 JoeBiden 6697 Republican 8.0
## 39 timburchett 6166 Republican 9.0
## 40 JohnCornyn 6148 Republican 10.0
## 41 GOPLeader 5726 Republican 11.0
## 42 FoxBusiness 5588 Republican 12.0
## 43 SteveScalise 5234 Republican 13.0
## 44 VP 4359 Republican 14.0
## 45 RepMattGaetz 4337 Republican 15.0
## 46 Jim_Jordan 4162 Republican 16.0
## 47 RepDonBacon 4124 Republican 17.0
## 48 tedcruz 3898 Republican 18.0
## 49 SenTedCruz 3689 Republican 19.0
## 50 WhiteHouse 3599 Republican 20.0
## 51 USDA 3318 Republican 21.0
## 52 RepMcCaul 3236 Republican 22.0
## 53 HouseCommerce 3218 Republican 23.0
## 54 RepAndyBiggsAZ 3208 Republican 24.0
## 55 RepChipRoy 3016 Republican 25.0
## 56 SenRickScott 2770 Republican 26.5
## 57 WSJ 2770 Republican 26.5
## 58 realdonaldtrump 2743 Republican 28.0
## 59 RandPaul 2732 Republican 29.0
## 60 CBP 2731 Republican 30.0
ggplot(accounts_mentioned_top_30_by_party, aes(x = count, y = reorder_within(mention, count, party))) +
geom_col(aes(fill = party)) +
geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
facet_wrap(~party, scales = "free_y") +
labs(title = "Top 30 Most Frequently Mentioned Accounts by Congress Members in Tweets, by Party, 2017-2023", x = "Number of Mentions", y = "Account") +
scale_fill_manual(values = c(D_HEX, R_HEX)) +
scale_x_continuous(labels = scales::comma, limits = c(NA, max(as.integer(accounts_mentioned_top_30_by_party$count)) * 1.05)) +
scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
theme_minimal() +
theme(legend.position = "none")
In this analysis, we have explored the tweets of U.S. Congress members from 2017 to 2023. We have analyzed the sentiment of tweets by different metrics, and identified the most common words, hashtags, and accounts mentioned in tweets. Here are some key takeaways from our analysis:
Democratic Congress members tweeted more frequently than Republican members.
Republican began tweeting more after 2020, catching up with the Democrats.
Senators tweeted more frequently than House members if we consider the number of tweets per member.
The top tweeters were mostly the same throughout the years, suggesting there are some members who are consistently active on Twitter.
Democratic members have used hashtags related to healthcare, climate change, and gun control, while Republican members have used hashtags related to the border, tax cuts, and the economy.
The hashtags of Democratic members transitioned from focusing on healthcare, equality, and climate change in 2018 to focusing on the pandemic, gun violence, and social justice in 2022.
The hashtags of Republican members transitioned from focusing on the border and tax reform in 2018 to focusing on the pandemic, border, and international escalations in 2022.
The word “trump” was the most common words used by Congress members, suggesting that the former president Donald Trump was a major topic of discussion among Congress members.
Hashtags by Democratic members are more policy-oriented, while hashtags by Republican members are more event-oriented.
The top words for Democratic members were related to healthcare, rights, and communities, while the top words for Republican members were related to the border, tax cuts, and the economy.
The top words for Democratic members have been consistent over the years, while the top words for Republican members have evolved, likely reacting to developments in various issues such as inflation, and energy crisis.
Senators mentioned “bipartisan” more frequently than House members, suggesting that Senate members are more likely to discuss bipartisan issues and cooperation in their tweets.
The sentiment of tweets of Congress members has been generally positive from 2018 to 2022.
Democratic members have become more positive in their tweets over time, while Republican members have become less positive.
Senators were slightly more positive than House members in their tweets.
Democratic members have a more positive sentiment when talking about topics such as “vaccine”, “abortion”, “diversity”, “immigration”, and “LGBTQ”, while Republican members have a more positive sentiment when talking about topics such as “Trump”, “gun”, and “climate”.
Democratic members like to retweet different caucuses and communities.
Republican members like to retweet its leaders, as well as conservative media outlets.
Overall, the analysis has provided valuable insights into the communication strategies and priorities of U.S. Congress members on Twitter. I hope that this analysis will help shed light on the dynamics of political discourse on social media and inform future research on the topic.
If you have any questions or feedback, or want to build on this analysis, please feel free to create pull requests or issues on the GitHub repository. Thank you for reading!